Towards improving ASR robustness for PSN and GSM telephone applications
نویسندگان
چکیده
In real-life applications, errors in the speech recognition system are mainly due to inefficient detection of speech Ž . segments, unreliable rejection of Out-Of-Vocabulary OOV words, and insufficient account of noise and transmission channel effects. In this paper, we review a set of techniques developed at CNET in order to increase the robustness to mismatches between training and testing conditions. These techniques are divided in two classes: preprocessing techniques Ž . and Hidden Markov Models HMM parameters adaptation. The results of several experiments carried out on field databases, as well as on databases collected over PSN and GSM networks are presented. The main sources of errors are analyzed. We show that a blind equalization scheme significantly improves the recognition accuracy regarding both field and GSM data. Speech detection allows a system to delimit the boundaries of the words to be recognized. We also use preprocessing techniques to increase the robustness of such detectors to noisy GSM speech. We show that spectral subtraction improves speech detection under noisy GSM conditions. Bayesian adaptation of HMM parameters produces models which are robust to field and GSM conditions. Models robust to GSM conditions can also be generated by linear regression adaptation of HMM parameters. Our experiments show an equivalent performance obtained with both Bayesian and linear regression adaptation of HMM parameters. The results obtained also prove that HMM adaptation and Ž . preprocessing techniques can be advantageously combined to improve Automatic Speech Recognition ASR robustness. q 1997 Elsevier Science B.V.
منابع مشابه
Research and Development of Robust Speech Recognition
This paper describes recent research and development activities on robust ASR (automatic speech recognition) in NTT Human Interface Laboratories. ASR system design has been changing from the experimental to the commercial level. A relevant issue in achieving practical ASR is robustness against environmental noise and speaker/circuit differences. Adaptation techniques have been widely investigat...
متن کاملSignal bias removal using the multi-path stochastic equalization technique
We propose using Hidden Markov Models (HMMs) associated with the cepstrum coefficients as a speech signal model in order to perform equalization or noise removal. The MUlti-path Stochastic Equalization (MUSE) framework allows one to process data at the frame level: it is an on-line adaptation of the model. More precisely, we apply this technique to perform bias removal in the cepstral domain in...
متن کاملAudio-Visual Automatic Speech Recognition: An Overview
We have made significant progress in automatic speech recognition (ASR) for well-defined applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, ASR performance has yet to reach the level required for speech to become a truly pervasive user interface. Indeed, even in “clean” acoustic environments, and for a variety of tasks,...
متن کاملAutomatic recognition of child speech for robotic applications in noisy environments
Automatic speech recognition (ASR) allows a natural and intuitive interface for robotic educational applications for children. However there are a number of challenges to overcome to allow such an interface to operate robustly in realistic settings, including the intrinsic difficulties of recognising child speech and high levels of background noise often present in classrooms. As part of the EU...
متن کاملAvoiding distortions due to speech coding and transmission errors in GSM ASR tasks
In this paper, we have extended our previous research on a new approach to ASR in the GSM environment. Instead of recognizing from the decoded speech signal, our system works from the digital speech representation used by the GSM encoder. We have compared the performance of a conventional system and the one we propose on a speaker independent, isolateddigit ASR task. For the half and full-rate ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Speech Communication
دوره 23 شماره
صفحات -
تاریخ انتشار 1997